Goto

Collaborating Authors

 candidate type


KGLink: A column type annotation method that combines knowledge graph and pre-trained language model

arXiv.org Artificial Intelligence

The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods encounter difficulties annotating columns when there is no match for column cells in the KG. Moreover, KG-based methods can provide multiple predictions for one column, making it challenging to determine the semantic type with the most suitable granularity for the dataset. This type granularity issue limits their scalability. On the other hand, deep learning-based methods face challenges related to the valuable context missing issue. This occurs when the information within the table is insufficient for determining the correct column type. This paper presents KGLink, a method that combines WikiData KG information with a pre-trained deep learning language model for table column annotation, effectively addressing both type granularity and valuable context missing issues. Through comprehensive experiments on widely used tabular datasets encompassing numeric and string columns with varying type granularity, we showcase the effectiveness and efficiency of KGLink. By leveraging the strengths of KGLink, we successfully surmount challenges related to type granularity and valuable context issues, establishing it as a robust solution for the semantic annotation of tabular data.


OntoType: Ontology-Guided Zero-Shot Fine-Grained Entity Typing with Weak Supervision from Pre-Trained Language Models

arXiv.org Artificial Intelligence

Fine-grained entity typing (FET), which assigns entities in text with context-sensitive, fine-grained semantic types, will play an important role in natural language understanding. A supervised FET method, which typically relies on human-annotated corpora for training, is costly and difficult to scale. Recent studies leverage pre-trained language models (PLMs) to generate rich and context-aware weak supervision for FET. However, a PLM may still generate a mixture of rough and fine-grained types, or tokens unsuitable for typing. In this study, we vision that an ontology provides a semantics-rich, hierarchical structure, which will help select the best results generated by multiple PLM models and head words. Specifically, we propose a novel zero-shot, ontology-guided FET method, OntoType, which follows a type ontological structure, from coarse to fine, ensembles multiple PLM prompting results to generate a set of type candidates, and refines its type resolution, under the local context with a natural language inference model. Our experiments on the Ontonotes, FIGER, and NYT datasets using their associated ontological structures demonstrate that our method outperforms the state-of-the-art zero-shot fine-grained entity typing methods. Our error analysis shows that refinement of the existing ontology structures will further improve fine-grained entity typing.


Automatic Semantic Modeling for Structural Data Source with the Prior Knowledge from Knowledge Base

arXiv.org Artificial Intelligence

A critical step in sharing semantic content online is to map the structural data source to a public domain ontology. This problem is denoted as the Relational-To-Ontology Mapping Problem (Rel2Onto). A huge effort and expertise are required for manually modeling the semantics of data. Therefore, an automatic approach for learning the semantics of a data source is desirable. Most of the existing work studies the semantic annotation of source attributes. However, although critical, the research for automatically inferring the relationships between attributes is very limited. In this paper, we propose a novel method for semantically annotating structured data sources using machine learning, graph matching and modified frequent subgraph mining to amend the candidate model. In our work, Knowledge graph is used as prior knowledge. Our evaluation shows that our approach outperforms two state-of-the-art solutions in tricky cases where only a few semantic models are known.


Large-scale Evaluation of Transformer-based Article Encoders on the Task of Citation Recommendation

arXiv.org Artificial Intelligence

Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.


Bribery as a Measure of Candidate Success: Complexity Results for Approval-Based Multiwinner Rules

arXiv.org Artificial Intelligence

We study the problem of bribery in multiwinner elections, for the case where the voters cast approval ballots (i.e., sets of candidates they approve) and the bribery actions are limited to: adding an approval to a vote, deleting an approval from a vote, or moving an approval within a vote from one candidate to the other. We consider a number of approval-based multiwinner rules (AV, SAV, GAV, RAV, approval-based Chamberlin--Courant, and PAV). We find the landscape of complexity results quite rich, going from polynomial-time algorithms through NP-hardness with constant-factor approximations, to outright inapproximability. Moreover, in general, our problems tend to be easier when we limit out bribery actions on increasing the number of approvals of the candidate that we want to be in a winning committee (i.e., adding approvals only for this preferred candidate, or moving approvals only to him or her). We also study parameterized complexity of our problems, with a focus on parameterizations by the numbers of voters or candidates.


A Novel CNet-assisted Evolutionary Level Repairer and Its Applications to Super Mario Bros

arXiv.org Artificial Intelligence

Applying latent variable evolution to game level design has become more and more popular as little human expert knowledge is required. However, defective levels with illegal patterns may be generated due to the violation of constraints for level design. A traditional way of repairing the defective levels is programming specific rule-based repairers to patch the flaw. However, programming these constraints is sometimes complex and not straightforward. An autonomous level repairer which is capable of learning the constraints is needed. In this paper, we propose a novel approach, CNet, to learn the probability distribution of tiles giving its surrounding tiles on a set of real levels, and then detect the illegal tiles in generated new levels. Then, an evolutionary repairer is designed to search for optimal replacement schemes equipped with a novel search space being constructed with the help of CNet and a novel heuristic function. The proposed approaches are proved to be effective in our case study of repairing GAN-generated and artificially destroyed levels of Super Mario Bros. game. Our CNet-assisted evolutionary repairer can also be easily applied to other games of which the levels can be represented by a matrix of objects or tiles.


KCAT: A Knowledge-Constraint Typing Annotation Tool

arXiv.org Artificial Intelligence

Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision. Thousands of manually annotated samples can achieve greater performance than millions of samples generated by the previous distant supervision method. Whereas, it's hard for human beings to differentiate and memorize thousands of types, thus making large-scale human labeling hardly possible. In this paper, we introduce a Knowledge-Constraint Typing Annotation Tool (KCAT), which is efficient for fine-grained entity typing annotation. KCAT reduces the size of candidate types to an acceptable range for human beings through entity linking and provides a Multi-step Typing scheme to revise the entity linking result. Moreover, KCAT provides an efficient Annotator Client to accelerate the annotation process and a comprehensive Manager Module to analyse crowdsourcing annotations. Experiment shows that KCAT can significantly improve annotation efficiency, the time consumption increases slowly as the size of type set expands.